Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloning metadata table does not work and can be dangerous #1309

Closed
keith-turner opened this issue Aug 5, 2019 · 9 comments
Closed

Cloning metadata table does not work and can be dangerous #1309

keith-turner opened this issue Aug 5, 2019 · 9 comments
Assignees
Labels
blocker This issue blocks any release version labeled on it.

Comments

@keith-turner
Copy link
Contributor

Accumulo supports cloning the metadata table. This will create a user table that points to the the metadata table files. When the Accumulo GC processes the root table, it does not look for file references in the metadata table. Therefore the files of the metadata table clone will eventually be deleted by the Accumulo GC, making the clone unusable.

When a clone of the metadata table is deleted, it will normally have no effect on the metadata files it references. However this is not always true. Its very hard to create, but there is a reverse situation where cloning the metadata table could cause metadata files to be deleted. The following Accumulo shell commands illustrate one way this could happen.

clonetable accumulo.metadata metaclone1
# the metadata table has a default compaction ration of 1
config -t metaclone1 -d table.compaction.major.ratio
# make any change to the metaclone and flush it to create a file
table metaclone1
delete 2< srv lock
flush -t metaclone1
clonetable metaclone1 metaclone2
# because metaclone2 refs some of metaclone1 files, deleting metaclone1 will insert delete entries
deletetable metaclone1
deletetable metaclone2
# the metadata table now has a delete entry for its own file OH NO! which is not referenced

Below shows the output of scanning the metadata table after running the commands above. The ~del entries for /accumulo/tables/!0/default_tablet/A000008z.rf and /accumulo/tables/!0/table_info/A0000094.rf are quite worrisome. These two files are where accumulo.metadata is currently storing its data and since accumulo.metadata does not reference them they will be deleted.

root@uno> scan -t accumulo.metadata
+rep< srv:dir []    hdfs://localhost:8020/accumulo/tables/+rep/default_tablet
+rep< srv:time []    L0
+rep< ~tab:~pr []    \x00
1< file:hdfs://localhost:8020/accumulo/tables/1/default_tablet/A000008h.rf []    2874,90
1< file:hdfs://localhost:8020/accumulo/tables/1/default_tablet/F000008q.rf []    1302,33
1< file:hdfs://localhost:8020/accumulo/tables/1/default_tablet/F000008w.rf []    918,15
1< last:100014156c8001c []    localhost:9997
1< loc:100014156c8001c []    localhost:9997
1< srv:dir []    hdfs://localhost:8020/accumulo/tables/1/default_tablet
1< srv:flush []    1
1< srv:lock []    tservers/localhost:9997/zlock-0000000000$100014156c8001c
1< srv:time []    M1565023397620
1< ~tab:~pr []    \x00
2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000000.rf []    216,1
2< last:100014156c80006 []    localhost:9997
2< loc:100014156c8001c []    localhost:9997
2< srv:dir []    hdfs://localhost:8020/accumulo/tables/2/default_tablet
2< srv:flush []    2
2< srv:lock []    tservers/localhost:9997/zlock-0000000004$100014156c80010
2< srv:time []    M1565020373913
2< ~tab:~pr []    \x00
~delhdfs://localhost:8020/accumulo/tables/!0/default_tablet/A000008z.rf : []
~delhdfs://localhost:8020/accumulo/tables/!0/table_info/A0000094.rf : []
~delhdfs://localhost:8020/accumulo/tables/9/c-00000000 : []
~delhdfs://localhost:8020/accumulo/tables/9/c-00000000/F000008n.rf : []
~delhdfs://localhost:8020/accumulo/tables/b/c-00000000 : []
~delhdfs://localhost:8020/accumulo/tables/b/c-00000000/F0000095.rf : []
~delhdfs://localhost:8020/accumulo/tables/b/c-00000001 : []

Below is scan of the root tablet showing that accumulo.metadata is using the two files that have been scheduled for deletion.

root@uno> scan -t accumulo.root
!0;~ file:hdfs://localhost:8020/accumulo/tables/!0/table_info/A0000094.rf []    786,21
!0;~ last:100014156c8001c []    localhost:9997
!0;~ loc:100014156c8001c []    localhost:9997
!0;~ srv:compact []    12
!0;~ srv:dir []    hdfs://localhost:8020/accumulo/tables/!0/table_info
!0;~ srv:flush []    28
!0;~ srv:lock []    tservers/localhost:9997/zlock-0000000000$100014156c8001c
!0;~ srv:time []    L145
!0;~ ~tab:~pr []    \x00
!0< file:hdfs://localhost:8020/accumulo/tables/!0/default_tablet/A000008z.rf []    404,3
!0< last:100014156c8001c []    localhost:9997
!0< loc:100014156c8001c []    localhost:9997
!0< srv:compact []    12
!0< srv:dir []    hdfs://localhost:8020/accumulo/tables/!0/default_tablet
!0< srv:flush []    28
!0< srv:lock []    tservers/localhost:9997/zlock-0000000000$100014156c8001c
!0< srv:time []    L13
!0< ~tab:~pr []    \x01~
~delhdfs://localhost:8020/accumulo/tables/!0/default_tablet/A000008s.rf : []
~delhdfs://localhost:8020/accumulo/tables/!0/table_info/A000008t.rf : []
~delhdfs://localhost:8020/accumulo/tables/!0/table_info/A000008y.rf : []
~delhdfs://localhost:8020/accumulo/tables/!0/table_info/A0000090.rf : []
~delhdfs://localhost:8020/accumulo/tables/!0/table_info/F000008x.rf : []
~delhdfs://localhost:8020/accumulo/tables/!0/table_info/F0000093.rf : []

The files were eventually deleted by Accumulo GC leaving Accumulo in a very bad state.

2019-08-05 12:53:19,985 [gc.SimpleGarbageCollector] DEBUG: Deleting hdfs://localhost:8020/accumulo/tables/!0/default_tablet/A000008z.rf
2019-08-05 12:53:19,986 [gc.SimpleGarbageCollector] DEBUG: Deleting hdfs://localhost:8020/accumulo/tables/!0/table_info/A0000094.rf

I noticed this behavior while working on #936

@keith-turner keith-turner added the blocker This issue blocks any release version labeled on it. label Aug 5, 2019
@keith-turner keith-turner added this to To do in 1.10.0 via automation Aug 5, 2019
@keith-turner
Copy link
Contributor Author

Below are some possible ways to fix this..

  • Disable the ability to clone the metadata table
  • Make cloning the metadata table copy files. This would have to handle race conditions with GC.
  • Rewrite the Accumulo GC to handle this.

@ctubbsii
Copy link
Member

ctubbsii commented Aug 5, 2019

I think it should be simple enough for the GC to check for candidates in use by all sources. Would that be sufficient?

@keith-turner
Copy link
Contributor Author

I think it should be simple enough for the GC to check for candidates in use by all sources

Conceptually it seems simple and I can not think of any gotchas at the moment. I don't think the code change would be simple.

@keith-turner
Copy link
Contributor Author

I think it should be simple enough for the GC to check for candidates in use by all sources.

Would not want to do this for 1.9. I am leaning twoards throwing an error in 1.9 if an attempt to clone the metadata table is made.

@ctubbsii
Copy link
Member

ctubbsii commented Aug 6, 2019

I think it should be simple enough for the GC to check for candidates in use by all sources.

Would not want to do this for 1.9. I am leaning twoards throwing an error in 1.9 if an attempt to clone the metadata table is made.

👍 This is the current behavior in 1.9 for the root table.

@mjwall
Copy link
Member

mjwall commented Aug 6, 2019

Wow, this is a good catch. Should we send out a warning on user and dev?

@ctubbsii
Copy link
Member

ctubbsii commented Aug 6, 2019

Wow, this is a good catch. Should we send out a warning on user and dev?

Yes, that's a good idea.

@keith-turner
Copy link
Contributor Author

The following is a much simpler situation that could lead to deleting metadata table files.

root@uno> clonetable accumulo.metadata mc1
root@uno> compact -t mc1

The compaction of the clone causes ~del entries to be inserted into the metadata table for metadata table files. After the above commands are run and the Accumulo GC runs there is a high chance that significant data loss would occur for the metadata table.

@keith-turner keith-turner self-assigned this Aug 6, 2019
keith-turner added a commit to keith-turner/accumulo that referenced this issue Aug 6, 2019
@keith-turner
Copy link
Contributor Author

If you have cloned the metadata table in the past I would recommend compacting the metadata table. This will cause it to have a new set of files different from the clone.

root@uno foo> clonetable accumulo.metadata mc1
root@uno> tables -l
accumulo.metadata    =>        !0
accumulo.replication =>      +rep
accumulo.root        =>        +r
foo                  =>         2
mc1                  =>         4
trace                =>         1
root@uno foo> scan -t accumulo.root -c file
!0;~ file:hdfs://localhost:8020/accumulo/tables/!0/table_info/A000000f.rf []    508,13
!0< file:hdfs://localhost:8020/accumulo/tables/!0/default_tablet/F000000e.rf []    338,1
root@uno foo> scan -t accumulo.metadata -c file
4;~ file:hdfs://localhost:8020/accumulo/tables/!0/table_info/A000000f.rf []    508,13
4< file:hdfs://localhost:8020/accumulo/tables/!0/default_tablet/F000000e.rf []    338,1
root@uno foo> compact accumulo.metadata
2019-08-06 14:56:27,547 [shell.Shell] INFO : Compaction of table accumulo.metadata started for given range
root@uno foo> scan -t accumulo.root -c file
!0;~ file:hdfs://localhost:8020/accumulo/tables/!0/table_info/A000000o.rf []    749,25
root@uno foo> scan -t accumulo.metadata -c file
4;~ file:hdfs://localhost:8020/accumulo/tables/!0/table_info/A000000f.rf []    508,13
4< file:hdfs://localhost:8020/accumulo/tables/!0/default_tablet/F000000e.rf []    338,1

After the compaction above the files referenced in accumulo.root are different.

@ctubbsii ctubbsii added this to To do in 2.1.0 via automation Aug 6, 2019
2.1.0 automation moved this from To do to Done Aug 14, 2019
1.10.0 automation moved this from To do to Done Aug 14, 2019
keith-turner added a commit that referenced this issue Aug 15, 2019
After the changes in #1309 cloning of metdata table is no longer
allowed. TabletStateChangeIteratorIT relied on cloning and
was changed to copy instead.

Also the test was very sensitive to concurrent chnages in the metadata
table.  Suspect that cloning used to introduce a delay that hid this.
The change from cloning to copying caused the test to fail often
because of these timing issues.  To avoid this, the test was refactored
to tolerate concurrent changes to the metadata table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue blocks any release version labeled on it.
Projects
No open projects
1.10.0
  
Done
2.1.0
  
Done
Development

No branches or pull requests

3 participants